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Abstract 

The  objective  of  this  point  paper  is  to  show  how 
the  Information  Resource  Dictionary  System 
(IRDS)  can  fulfill  critical  design  and 
operational  requirements  for  CALS  Phase  II. 
First,  a series  of  assumptions  are  made  about 
the  data  management  services  which  are  needed  by 
CALS  Phase  II.  Next,  these  assumptions  are  used 
to  develop  a series  of  requirements  for  a 
dictionary  system.  The  structure  of  the  IRDS 
family  of  standards  is  then  described.  Examples 
are  provided  to  illustrate  how  the  IRDS  could 
meet  the  requirements.  A schedule  is  presented 
to  show  that  the  IRDS  and  other  data  management 
standards  will  be  available  when  needed  to  meet 
the  immediate  requirements  of  CALS.  An 
architecture  is  presented  to  illustrate 
additional  standards  required  to  achieve  longer- 
range  goals  of  distributed  database.  Finally, 
development  tasks  are  recommended. 
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1.  Assumptions 

The  following  assumptions  are  made  about  the  mechanisms  needed 
to  achieve  CALS  Phase  II: 

o A logically  integrated  database  is  absolutely  necessary. 

o A single  physical  database  is  impractical — multiple 
physical  databases  controlled  by  multiple  DBMSs  are 
required. 

o A three-schema  architecture  is  the  only  practical  way  of 
achieving  control  over  such  a database. 

o All  activity  of  the  DBMSs  must  be  logically  controlled 
by  a logically  single  representation  of  the  three-schema 

architecture . 

o A single  physical  representation  of  the  three-schema 

architecture  is  impractical— multiple , coordinated , 

geographically  distributed  physical  representations  are 

required. 

o The  development  and  maintenance  of  the  three-schema 

architecture  will  be  extremely  complex  and  require  a very 
large  volume  of  data. 

o To  add  further  complication,  a three-schema  architecture, 
since  it  represents  a union  of  different  disciplines  (as 
represented  by  different  external  schemas) , may  be 
developed  using  more  than  one  modeling  methodology. 
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2 . Requirements  for  a Dictionary 

Given  the  preceding  assumptions,  the  following  may  be  concluded: 

o A mechanism  is  required  to  actively  supply  three-schema 
information  to  the  various  DBMSs. 

o A mechanism  is  required  to  coordinate  separate  sources 
of  such  three-schema  information. 

o A mechanism  is  required  to  support  data  modeling  and 
schema  maintenance. 

o A mechanism  may  be  required  to  support  multiple 
methodologies  for  performing  the  data  modeling  process-- 
i.e.,  there  may  be  a requirement  for  a supporting 

mechanism  that  is  neutral  with  respect  to  methodology. 

Coordinated  active  dictionaries  which  are  neutral  with  respect 
to  data  modeling  methodology  are  one  mechanism  for  satisfying 
the  preceding  requirements.  Such  dictionaries  must  support  four 
distinct  processes: 

o They  must  have  a standard  interface  to  provide  three- 
schema  information  to  a variety  of  database  management 
systems . 

o They  must  support  the  process  of  coordinating 

communication  to  control  the  DBMSs. 

o They  must  support  the  process  of  describing  the 

vocabulary  needed  by  the  chosen  data  modeling 

methodologies . 

o They  must  support  the  process  by  which  each  methodology 
populates  a dictionary  using  its  vocabulary. 
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3.  Structure  of  the  IRDS  Family  of  Standards 
3.1.  ANS  X3.138 

Most  of  the  functionality  of  the  IRDS  is  defined  by  the  American 
National  Standard  for  Information  Systems  - Information  Resource 

Dictionary  System,  which  is  an  American  National  Standard  (ANS) , 
X3. 138-1988,  and  a Federal  Information  Processing  Standard 
(FIPS) , Publication  156.  The  parts  of  this  standard  most 
relevant  to  CALS  Phase  II  deal  with  the  creation,  maintenance, 
and  retrieval  of  two  distinct  levels  of  data.  One  level  of  data 
is  used  to  describe  a vocabulary,  and  the  other  level  is  used  to 
apply  that  vocabulary  in  the  data  modeling  methodologies.  The 
former  level  is  referred  to  as  the  IRD  Schema  Level . and  the 
latter  as  the  IRD  Level.  The  IRDS  provides  for  extensibility  at 
the  IRD  Schema  Level  to  satisfy  the  particular  requirements  of 
any  set  of  methodologies  and  applications.  This  degree  of 
generality  is  not  free:  the  first  task  in  the  use  of  the  IRDS 
to  support  concurrent  engineering  must  be  to  use  the  IRD  Schema 
Level  to  define  a standard  vocabulary  which  will  serve  as  a union 
among  the  individual  methodologies  and  applications. 
Fortunately,  the  different  data  modeling  methodologies  have  small 
vocabularies  (e.g.,  a few  different  kinds  of  boxes,  lines,  and 
arrows)  so  this  should  not  be  a particularly  difficult  task.  For 
example,  the  following  vocabulary  covers  the  major  concepts  of 
the  IDEF1X  modeling  methodology: 

o entity  is  an  IDEF1X  entity, 

o entitv-is-assembled-of -entity  is  a type  of  IDEF1X 

relationship, 

o (cardinalitv-1 . cardinalitv-2)  is  a property  of  an  IDEF1X 
relationship, 

o entitv-conta ins-relationship  associates  an  attribute  (or 

data  element)  with  an  entity;  the  association  may  have 
various  properties  (e.g.,  an  attribute  may  be  used-as 
"ok"  to  indicate  that  it  is  part  of  the  primary  key) , 

o category  is  an  IDEF1X  category, 

o entity- is-a-cateqorv  indicates  that  an  entity  is 

associated  with  a particular  category,  and 

o cateqorv-set-is-total  indicates  that  the  set  of  entities 
is  exhaustive. 

The  next  task  is  then  to  use  that  vocabulary  within  the  data 
modeling  methodologies  to  define  the  three  types  of  schemas. 
For  example,  the  following  objects  serve  to  group  entities  into 
the  indicated  schemas: 
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o conceptual -schema-view  collects  together  the  entities  in 

the  conceptual  schema,  and 

o LSA-external -schema-view  collects  together  the  entities 

in  the  external  schema  used  for  LSA. 

The  following  are  examples  of  entities,  relationships,  and 
attributes  in  a schema: 

o product-item is-assembled-of product- item-usage 

cardinality  = (parent="l" . child=lln,M  indicates  that  one 
item  may  be  assembled  from  many  simpler  items, 

o product-item  contains  part-number  used-as  = "ok" 

indicates  that  the  primary  key  of  item  is  the  part 
number , and 

o geometric-model  is-a  geometric-model -category  qeometric- 

model-categorv  can-be-a  wire-frame-model  indicates  that 
a geometric  model  can  be  a wire  frame  model  or  some  other 
type  of  model  which  may  or  may  not  be  enumerated 
explicitly. 

A particularly  important  feature  of  the  IRDS  is  the  ability  to 
provide  user-oriented  names.  For  example,  associated  with 
product-item  may  be  the  following: 

o identification-names  = (alternate-name  = "LSA  product 

item",  alternate-name-context  = "LSA'M  . which  could 
provide  a special  name  for  product-item  in  the  LSA 
external  schema, 

o descriptive-name  = some-verv-lonq-descriptive-name- 

which-is-readable-but-not-easilv-tvped-or-accepted-bv- 

programming-languages-or-DBMSs . provides  an  alternate  to 
the  normal  short  access  name,  and 

o some-general-name  (LSA-SOL-variation:  2)  indicates  that 
this  is  revision  2 of  a general  object  which  has  been 
specialized  to  the  use  of  SQL  in  an  LSA  application. 

3.2.  IRDS  Services  Interface 

ANS  X3.138  defines  two  interfaces  which  can  be  used  by  people  to 
define  and  populate  a repository  for  schema  definitions,  but 
there  are  no  standard  capabilities  for  constructing  analyses, 
diagrams,  and  so  on  for  the  schemas.  This  is  appropriate,  given 
that  there  are  no  standard  methodologies.  Other  tools,  dependent 
on  particular  methodologies,  must  reguest  data  from  the  IRDS  to 
provide  these  capabilities.  The  draft  standard  for  the  IRDS 
Services  Interface  defines  functionality  for  communicating 
between  the  IRDS  and  other  software,  such  as  a diagramming 
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program  or  a DBMS.  The  Services  Interface  is  a program  call 
interface  which  complements  the  human  interfaces  defined  in  ANS 
X3.138.  This  interface  will  provide  the  active  dictionary 
support  that  is  required  to  control  the  DBMS . as  well  as  allowing 
continued  use  and  integration  of  existing  support  tools  for 

IDEFO,  IDEF1X  and  other  methodologies. 

3.3.  IRDS  Export/Import  File  Format 

ANS  X3.138  also  specifies  functionality  for  exporting  all  or  part 
of  a dictionary  into  a file,  checking  a second  dictionary  for 
compatibility  with  that  file,  and  optionally  importing  that  file 
into  the  second  dictionary.  However,  ANS  X3.138  does  not  specify 
the  format  of  the  file;  this  is  the  subject  of  the  final  critical 
element  in  the  IRDS  family  of  standards,  the  draft  standard  for 
the  IRDS  Export/Import  File  Format.  This  defines  an  efficient 
and  reliable  format  for  communicating  mass  data,  such  as  an 
entire  database  schema  or  collection  of  database  schemas,  from 
one  dictionary  to  another.  ANS  X3.138  in  conjunction  with  the 
IRDS  Export/Import  File  Format  standard  can  be  used  to  compare 
one  dictionary  with  another,  as  well  as  to  populate  an  empty 
dictionary.  The  standards  will  therefore  provide  the  required 
capability  of  dictionary  coordination. 

3.4.  Categorization  of  the  IRDS  and  Other  Standards 

The  standards  needed  by  CALS  Phase  II  can  be  divided  into  two 
general  categories: 

o Data  standards  represent  the  "business  rules"  of  CALS, 
and  are  developed  by  the  CALS  Office,  the  military 
Services  and  Agencies,  and  defense  contractors.  They 
are  represented  by  the  contents  of  two  layers  of  the 
IRDS : 

oo  IRD  Schema  Laver  represents  modeling  rules,  and 

oo  IRD  Laver  represents  the  schemas  and  related 
information. 

o Technical  standards  represent  the  standard  "tools"  for 
building  CALS  database  systems.  These  are  currently  the 
following: 

oo  IRDS  ANS  X3.138. 

oo  proposed  IRDS  Services  Interface, 
oo  proposed  IRDS  Export/Import  File  Format. 
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oo  SOL  for  relational  database 
particular  site,  and 

oo  proposed  Remote  Database  Access 
a query  at  a remote  site. 


management  at  a 
f RDA)  for  executing 
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4.  Availability  of  the  IRDS  and  Data  Management  Standards 

The  diagram  on  Timelines  of  Data  Standards  Needed  by  CALS  and 
PDES  is  intended  to  demonstrate  that  data  management  standards 
and  commercial  implementations  are  expected  to  be  available  when 
they  are  needed  to  satisfy  CALS  requirements.  SQL  is  evolving 
through  SQL2  to  SQL3 , which  is  expected  to  be  object-oriented  and 
capable  of  effectively  managing  text,  graphics,  and  other  CALS 
data  types.  Remote  Database  Access  (RDA)  will  provide  a 
primitive  capability  for  sending  a query  to  a remote  database  and 
receiving  data  and  status  information  in  response. 
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5.  An  Architecture  for  Distributed  Database 

RDA  is  insufficient  for  true  distributed  database  management, 
since  it  does  not  deal  with  such  issues  as  query  decomposition 
(where  a single  query  requests  data  from  multiple  databases)  or 
transaction  management  (needed  to  resolve  simultaneous  accesses 
and  updates  to  the  same  data)  . The  following  is  a possible 
layered  architecture  for  standards  that  would  provide  the 
required  services: 

6.  DBMS  syntax  and  presentation  transparency 
5.  location  and  performance  transparency 
4 . transaction  transparency 
3 . communication  transparency 

2 . mapping  from  global  conceptual  schema  into  local  internal 
schema 

1.  database  access 

The  users  interact  with  Layer  6,  which  isolates  them  from  details 
of  names,  formats,  and  DBMS  syntax  at  the  various  databases. 
Each  user  seems  to  be  dealing  with  a single,  dedicated  database 
specific  to  his  or  her  requirements.  IRDS  support  of  a three- 
schema  architecture  is  essential  to  map  from  the  user's  local 
external  schema  into  the  global  conceptual  schema.  Layer  5 
performs  services  such  as  query  decomposition  and  optimization, 
in  order  to  effectively  deal  with  data  distribution.  Directory 
and  schema  information  is  essential.  Layer  4 provides  for 
detection  and  resolution  of  conflicts  in  accessing  and  updating 
data.  Layer  3 handles  the  details  of  communicating  with  remote 
sites.  RDA  is  positioned  in  Layer  3.  Layers  2 and  1 provide 
services  at  a particular  remote  site  to  satisfy  a particular 
piece  of  the  original  query  or  update.  IRDS  services  are  again 
required  to  provide  support  for  the  three-schema  architecture. 

New  standards  will  be  required  for  Layers  6,  5,  4,  and  3,  while 
IRDS  and  SQL  may  require  extensions  to  deal  with  Layer  2. 
Layer  1 provides  operating  system  services  that  are  identical  for 
distributed  or  centralized  databases. 
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6 . Development  Tasks 

The  following  are  some  of  the  near-term  tasks  required  for  CALS 
Phase  II: 

o Development  of  the  vocabulary  at  the  IRD  Schema  Laver  to 
support  methodologies  for  modeling  the  various 
applications. 

o Development  of  the  vocabulary  at  the  IRD  Schema  Laver  to 
support  modeling  distributed  databases. 

o Population  of  the  IRD  Laver  with  application  models. 

o Population  of  the  IRD  Laver  with  distributed  database 
models. 

o Interfacing  Computer-Aided  Software  Engineering  (CASE) 

tools  to  the  IRDS . in  order  to  assist  in  development  and 
analysis  of  models. 

o Development  of  standard  ways  of  using  the  IRDS  to  support 
integrated  process  and  data  modeling  (e.g. , through  a 
standard  based  on  IDEFO  and  IDEF1X) . 

The  following  are  some  of  the  longer-term  tasks  required  for  CALS 
Phase  II: 

o Completion  of  the  architecture  for  distributed  database. 

o Development  of  standards  based  on  that  architecture. 

The  conclusion  is  that  the  IRDS  is  a good  basis  for  CALS  Phase 
II,  but  a large  amount  of  work  is  needed  to  develop  other 
standards  that  will  be  required. 
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